Power Tuning for HPC Jobs under Manufacturing Variations

نویسندگان

Neha Gholkar

Frank Mueller

Barry Rountree

چکیده

As we approach the exascale era, power has become a primary bottleneck. The US Department of Energy has set a power constraint of 20MW on each exascale machine. To be able achieve one exaflop in 20MW, it is necessary that we use power intelligently to maximize performance under a power constraint. Most production-level parallel applications that run on a supercomputer are tightly-coupled parallel applications. A naive approach of enforcing a power constraint for a parallel job would be to divide the job’s power budget uniformly across all the processors. However, previous work has shown that a power capped job suffers from performance variation of the processors due to manufacturing variations leading to overall sub-optimal performance. We propose a 2-level hierarchical variation-aware approach of managing power at machine-level. At macro-level, PPartition partitions machine’s power budget across jobs to assign a power budget to each job running on the system such that the machine never exceeds its power budget. At micro-level, PTune makes jobcentric decisions by taking the performance variation into account. For every moldable job, it determines the optimal number of processors, the selection of processors and the distribution of the job’s power budget across them, with the goal of maximizing the job’s performance under its power budget. Our evaluations show that at micro-level, PTune achieves a performance improvement of up to 29% compared to the naive approach. PTune does not lead to any performance degradation, yet frees up almost 40% of the processors for the same performance as that of the näıve approach, under a hard power bound. PPartition is able to achieve a throughput improvement of 5-35% compared to uniform power distribution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-tuning job scheduling strategies for the resource management of HPC systems and computational grids

In this thesis we develop and study self-tuning job schedulers for resource management systems. Such schedulers search for the best solution among the available scheduling alternatives in order to improve the performance of static schedulers. In two domains of real world job scheduling this concept is implemented. First of all, we study the scheduling in resource management software for high pe...

متن کامل

Basic Issues in Identification Scheme of a Self-Tuning Power System Stabilizer

Power system stabilizers have been widely used and successfully implemented for the improvement of power system damping. However, a fixed parameter power system stabilizer tends to be sensitive to variations in generator dynamics so that, for operating conditions away from those used for design, the effectiveness of the stabilizer can be greatly impaired. With the advent of microprocessor techn...

متن کامل

Exploiting performance counters to predict and improve energy performance of HPC systems

Hardware monitoring through performance counters is available on almost all modern processors. Although these counters are originally designed for performance tuning, they have also been used for evaluating power consumption. We propose two approaches for modelling and understanding the behaviour of high performance computing (HPC) systems relying on hardware monitoring counters. We evaluate th...

متن کامل

Power, Reliability, Performance: One System to Rule Them All

Traditionally, the emphasis of High Performance Computing (HPC) data centers and applications has been on performance. However, it is anticipated that future generation supercomputing systems will face major challenges in reliability, power management, and thermal variations. Disruptive solutions are required to optimize performance in the presence of these challenges. We believe that a smart p...

متن کامل

Cooperative Batch Scheduling for HPC Systems

The batch scheduler is an important system software serving as the interface between users and HPC systems. Users submit their jobs via batch scheduling portal and the batch scheduler makes scheduling decision for each job based on its request for computing sources, i.e. core-hours. However, jobs submitted to HPC systems are usually parallel applications and their lifecycle consists of multiple...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Power Tuning for HPC Jobs under Manufacturing Variations

نویسندگان

چکیده

منابع مشابه

Self-tuning job scheduling strategies for the resource management of HPC systems and computational grids

Basic Issues in Identification Scheme of a Self-Tuning Power System Stabilizer

Exploiting performance counters to predict and improve energy performance of HPC systems

Power, Reliability, Performance: One System to Rule Them All

Cooperative Batch Scheduling for HPC Systems

عنوان ژورنال:

اشتراک گذاری